Skip to main content

Hypothesis Testing

The Null

The Null H0H_0 usually is “The World is as it is and my intervention had no effect.” It’s usually the most conservative position you can imagine. Your exertions had No Effect. Here’s some typical Nulls:

Test typeTypical H0H_0Typical HAH_A
Mean differenceμ=0\mu = 0μ0\mu ≠ 0
Risk ratio / odds ratioRR = 1RR ≠ 1
Hazard ratio (Cox)HR = 1HR ≠ 1
Correlationρ=0\rho = 0ρ0\rho ≠ 0
Regression coefficientβ=0\beta = 0β0\beta ≠ 0

The Alternative Hypothesis HAH_A is usually “Yeah my shit might have done something to The World” as measured by some estimate. You’re trying to see if you can falsify the Null (thanks Popper et al.)

Hark!

You never accept the Alternative/Research Hypothesis HaH_a! Falsifiability FTW! You either reject or fail to reject the Null Hypothesis H0H_0.

A Bounded Null

Do you ever set H0:μμ0H_0: \mu \ne \mu_0 … ?

Nope. Think about what you’d say if you rejected the null in H0:μ5H_0: \mu \ne 5. “The mean is exactly 5.” That’s weird statistically and kinda philosophically.

And you’re testing the Null by calculating a sampling distribution under H0H_0 and asking “How surprising is my data?” You answer this question by computing all manner of Test Statistics (e.g. z=xˉμσnz = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}) and then computing the p-Value.

So which value from Z\mathbb{Z} are you going to plug in that’s not 55? Yep. Each one will give you a different distribution. Just don’t do it.

p-Value

If you want to make a falsifiable claim (thanks Popper) about The World, a p-value is as easy as this:

What is the probability of seeing what I saw in my experiment if the null hypothesis is true?1

78%? Well that sounds bad. You fail to reject the null. 5%? That’s small. Maybe something’s going on? 0.1%? Okay maybe something’s really going on. “Something” here means association, not causation.

Confidence Intervals

You’ve seen them. “RR = 1.5 95% CI [1.3,1.6]”. What do they mean? Do they mean that you’re 95% sure the true value is somewhere in there?

Nope! Common mistake2. You’re saying that if you repeated your experiment several times, your value would ‘wiggle’ each time (different sample, other rando effects) but 95% of the time will be in the interval. That’s all.

Crossing the Null

”Which Test?” TLDR

To pick a test, and generally speaking, you’ll be asking

  • What is the nature of my Data3? Continuous? Categorical?
  • How many groups am I dealing with? One, two, or more than two?

Here’s a nice little table from this excellent video (by a Columbia alum!)

1 Group2 Groups2+ Groups
Categorical DataProportion Test (ZZ-test approx.)
χ2\chi^2 Test
Proportion Test (ZZ-test approx.)
χ2\chi^2 Test
χ2\chi^2 Test
Continuous DataZZ-test & Variants
tt-test & Variants
ZZ-test & Variants
tt-test & Variants
ANOVA (FF-test, 1-way, 2-way)
Classic Assumptions Violated4Sign Test
Signed Rank Test
Wilcoxon–Mann–Whitney Test
Paired tt-test
McNemar’s Test
Kruskal–Wallis Test

Footnotes

  1. “The likelihood of obtaining results at least as extreme as the ones observed, assuming that the null hypothesis is true”. I’ve never liked this as a starter definition.

  2. That’s a Bayesian credible interval.

  3. Always seek to understand your Data all the time 🙏

  4. Too many outliers, small sample size, correlated observations